Transforming a Constituency Treebank into a Dependency Treebank

نویسندگان

  • Alexander F. Gelbukh
  • Hiram Calvo
  • Sulema Torres
چکیده

We present a heuristic technique for converting a constituency treebank into a dependency treebank. In particular, we comment on our experience in converting the Spanish treebank Cast3LB. We extract a context-free grammar from the treebank, automatically identify the head in each rule, and use this information for constructing the dependency tree. Our heuristics have 99% precision and 80% recall in identifying the head in the rules, which gives 92% accuracy in identifying dependencies between words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تبدیل خودکار درخت‌بانک وابستگی فارسی به درخت‌بانک سازه‌ای

There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...

متن کامل

Statistical French Dependency Parsing: Treebank Conversion and First Results

We first describe the automatic conversion of the French Treebank (Abeillé and Barrier, 2004), a constituency treebank, into typed projective dependency trees. In order to evaluate the overall quality of the resulting dependency treebank, and to quantify the cases where the projectivity constraint leads to wrong dependencies, we compare a subset of the converted treebank to manually validated d...

متن کامل

Converting SynTagRus Dependency Treebank into Penn Treebank Style

This paper presents the conversion of SynTagRus dependency structures into Penn Treebank style phrase structures, whose resulting data will be used to train a statistical constituency parser for Russian and create a large-scale constituency-parsed corpus. The implemented conversion includes various innovative features in order to create phrase structure trees that are closest to Penn Treebank s...

متن کامل

An Empirical Evaluation of Automatic Conversion from Constituency to Dependency in Hungarian

In this paper, we investigate the differences between Hungarian sentence parses based on automatically converted and manually annotated dependency trees. We also train constituency parsers on the manually annotated constituency treebank and then convert their output to dependency trees. We argue for the importance of training on gold standard corpora, and we also demonstrate that although the r...

متن کامل

Evalita’09 Parsing Task: constituency parsers and the Penn format for Italian

The aim of Evalita Parsing Task is at defining and extending the state of the art for parsing Italian by encouraging the application of existing models and approaches. Therefore, as in the first edition, the Task includes two tracks, i.e. dependency and constituency. This second track is based on a development set in a format, which is an adaptation for Italian of the Penn Treebank format, and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 35  شماره 

صفحات  -

تاریخ انتشار 2005